176 research outputs found

    Normalized Web Distance and Word Similarity

    Get PDF
    There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalized is a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases

    Hierarchical structuring of Cultural Heritage objects within large aggregations

    Full text link
    Huge amounts of cultural content have been digitised and are available through digital libraries and aggregators like Europeana.eu. However, it is not easy for a user to have an overall picture of what is available nor to find related objects. We propose a method for hier- archically structuring cultural objects at different similarity levels. We describe a fast, scalable clustering algorithm with an automated field selection method for finding semantic clusters. We report a qualitative evaluation on the cluster categories based on records from the UK and a quantitative one on the results from the complete Europeana dataset.Comment: The paper has been published in the proceedings of the TPDL conference, see http://tpdl2013.info. For the final version see http://link.springer.com/chapter/10.1007%2F978-3-642-40501-3_2

    CFT Duals for Extreme Black Holes

    Get PDF
    It is argued that the general four-dimensional extremal Kerr-Newman-AdS-dS black hole is holographically dual to a (chiral half of a) two-dimensional CFT, generalizing an argument given recently for the special case of extremal Kerr. Specifically, the asymptotic symmetries of the near-horizon region of the general extremal black hole are shown to be generated by a Virasoro algebra. Semiclassical formulae are derived for the central charge and temperature of the dual CFT as functions of the cosmological constant, Newton's constant and the black hole charges and spin. We then show, assuming the Cardy formula, that the microscopic entropy of the dual CFT precisely reproduces the macroscopic Bekenstein-Hawking area law. This CFT description becomes singular in the extreme Reissner-Nordstrom limit where the black hole has no spin. At this point a second dual CFT description is proposed in which the global part of the U(1) gauge symmetry is promoted to a Virasoro algebra. This second description is also found to reproduce the area law. Various further generalizations including higher dimensions are discussed.Comment: 18 pages; v2 minor change

    Effect of heuristics on serendipity in path-based storytelling with linked data

    Get PDF
    Path-based storytelling with Linked Data on the Web provides users the ability to discover concepts in an entertaining and educational way. Given a query context, many state-of-the-art pathfinding approaches aim at telling a story that coincides with the user's expectations by investigating paths over Linked Data on the Web. By taking into account serendipity in storytelling, we aim at improving and tailoring existing approaches towards better fitting user expectations so that users are able to discover interesting knowledge without feeling unsure or even lost in the story facts. To this end, we propose to optimize the link estimation between - and the selection of facts in a story by increasing the consistency and relevancy of links between facts through additional domain delineation and refinement steps. In order to address multiple aspects of serendipity, we propose and investigate combinations of weights and heuristics in paths forming the essential building blocks for each story. Our experimental findings with stories based on DBpedia indicate the improvements when applying the optimized algorithm

    An almost sure limit theorem for super-Brownian motion

    Get PDF
    We establish an almost sure scaling limit theorem for super-Brownian motion on Rd\mathbb{R}^d associated with the semi-linear equation ut=1/2Δu+βuαu2u_t = {1/2}\Delta u +\beta u-\alpha u^2, where α\alpha and β\beta are positive constants. In this case, the spectral theoretical assumptions that required in Chen et al (2008) are not satisfied. An example is given to show that the main results also hold for some sub-domains in Rd\mathbb{R}^d.Comment: 14 page

    Cumulants and the moment algebra: tools for analysing weak measurements

    Full text link
    Recently it has been shown that cumulants significantly simplify the analysis of multipartite weak measurements. Here we consider the mathematical structure that underlies this, and find that it can be formulated in terms of what we call the moment algebra. Apart from resulting in simpler proofs, the flexibility of this structure allows generalizations of the original results to a number of weak measurement scenarios, including one where the weakly interacting pointers reach thermal equilibrium with the probed system.Comment: Journal reference added, minor correction

    Efficient LZ78 factorization of grammar compressed text

    Full text link
    We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size nn representing a text SS of length NN, our algorithm computes the LZ78 factorization of TT in O(nN+mlogN)O(n\sqrt{N}+m\log N) time and O(nN+m)O(n\sqrt{N}+m) space, where mm is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the nNn\sqrt{N} term in the time and space complexities becomes either nLnL, where LL is the length of the longest LZ78 factor, or (Nα)(N - \alpha) where α0\alpha \geq 0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of SS of a certain length. Since m=O(N/logσN)m = O(N/\log_\sigma N) where σ\sigma is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ\sigma is constant, and can be more efficient when the text is compressible, i.e. when mm and nn are small.Comment: SPIRE 201

    mspecLINE: bridging knowledge of human disease with the proteome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Public proteomics databases such as PeptideAtlas contain peptides and proteins identified in mass spectrometry experiments. However, these databases lack information about human disease for researchers studying disease-related proteins. We have developed mspecLINE, a tool that combines knowledge about human disease in MEDLINE with empirical data about the detectable human proteome in PeptideAtlas. mspecLINE associates diseases with proteins by calculating the semantic distance between annotated terms from a controlled biomedical vocabulary. We used an established semantic distance measure that is based on the co-occurrence of disease and protein terms in the MEDLINE bibliographic database.</p> <p>Results</p> <p>The mspecLINE web application allows researchers to explore relationships between human diseases and parts of the proteome that are detectable using a mass spectrometer. Given a disease, the tool will display proteins and peptides from PeptideAtlas that may be associated with the disease. It will also display relevant literature from MEDLINE. Furthermore, mspecLINE allows researchers to select proteotypic peptides for specific protein targets in a mass spectrometry assay.</p> <p>Conclusions</p> <p>Although mspecLINE applies an information retrieval technique to the MEDLINE database, it is distinct from previous MEDLINE query tools in that it combines the knowledge expressed in scientific literature with empirical proteomics data. The tool provides valuable information about candidate protein targets to researchers studying human disease and is freely available on a public web server.</p
    corecore